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There  are  many  software  database  management  systems  available 
on  many  general-purpose  computers  ranging  from  micros  to  super¬ 
mainframes,  wi th  many  distinct  functionalities  such  as  relational 
vs.  hierarchical  and  text-retrieval  vs.  f ormatted-data-retr ieval- 


and-update.  Do  we  really  need  a  few  special-purpose  machines  for 
database  management?  In  particular,  there  is  the  Grosh’s  law 
which  says  as  follows: 

"Whenever  the  capacity  of  a  mainframe  computer  is  saturated 
with  the  present  work  load,  there  is  always  another  more  powerful 
mainframe  which  can  support  the  work  load  cost-effectively  (with 
spare  capacity) . " 

For  example,  if  a  computer  system  such  as  IBM  3033  is 
saturated  with  the  database  management  tasks  and  the  database  on 
the  IBM  2314  disks  has  grown  to  its  capacity,  we  can  replace  IBM 
3033  CPU  with  IBM  3081  CPU  and  IBM  2314  disks  with  IBM  3340  or 
3.30  disks.  To  this  example,  the  Grosh’s  law  may  apply.  However, 
the  Grosh's  law  does  not  always  work.  Consider  the  next  example. 
The  presence  of  communications  frontend  computer  can  offload  the 


communications  work  from  the  mainframe  cost-effectively  so  that, 
instead  of  replacing  the  present  mainframe  with  a  more  powerful 
model  due  to  heavy  communications,  we  can  retain  the  mainframe 
longer.  As  it  has  turned  out,  the  communications  frontend  pro¬ 
vides,  in  addition  to  cost-effectiveness,  improved  performance  and 
new  functionality  (e.g.,  serving  as  gateways  to  networks).  In 
other  words,  the  Grosh's  law  does  not  work,  only  if  the  special- 
purpose  computer,  which  offloads  certain  types  of  work  from  the 
mainframe,  can  provide  lower  cost,  higher  performance,  and  newer 
functional ity. 

"^Database  machines  as  backend  computers  can  offload  the  data¬ 
base  management  work  from  the  mainframe  so  that  we  can  retain  the 
same  mainframe  longer.  However,  the  database  backend  must  also 
demonstrate  lower  cost,  higher  performance,  and  newer  functional- 

it,.  — 1^3 

How  to  Keep  the  Cost  Low? 

From  the  technological  viewpoint,  the  database  machine  should 
not  be  built  with  distant  or  expensive  technologies  such  as  very 
large  associative  arrays.  The  database  machine  should  utilize 
existing  and  improved  technologies  such  as  VLSI  and  parallel 
transfer  disks.  From  the  architectural  viewpoint,  the  database 
machine  should  not  insist  on  a  pure  architecture  such  as  the  cel¬ 
lular  machine  architecture.  As  a  special-purpose  machine  for  data¬ 
base  management,  we  should  first  characterize  its  database  manage- 
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ment  functions  and  then  realize  these  functions  in  the  hardware 


directly.  Instead  of  applying  a  single  architectural  principle  to 
the  entire  machine  such  as  applying  pipelining  principle  to  come 
up  with  a  pipelining  machine,  we  should  apply  architectural  prin¬ 
ciples  such  as  pipeling,  concurrency  and  parallelism  to  various 
design  levels  of  the  database  machine.  Prom  the  viewpoint  of 
transforming  the  software  techniques  for  database  management  into 
hardware  database  machine  components,  we  should  not  use  unproven 
or  seldom-practiced  software  techniques  such  as  data  pools  [1]. 
Instead,  we  should  transform  proven  software  techniques  such  as 
indexing  and  clustering  into  the  hardware. 


Huw  to  Keep  the  Performance  High? 


Let  us  take  a  look  at  the  following  gross  aggregates  of  a 
database  management  systems  (DBMS): 


Mainframe  (s) 


User (s) 


First  Hypothesis  (observation):  No  matter  how  great  the  amount  of 
cui  is  to  be  processed  by  the  database  processors,  we  can  always 
process  the  data  at  the  rate  that  they  are  being  received. 


Second  Hypothesis:  No  matter  how  complex  the  DBMS  software  is, 
there  is  always  a  hardware  architecture  which  can  cause  the  execu¬ 
tion  of  the  DBMS  to  be  I/O-bound. 

The  aforementioned  two  hypotheses,  i.e.,  my  observations,  have 
important  consequences.  In  order  to  articulate  their  importance, 
these  consequences  are  expressed  as  corollaries. 

Corollary  One:  Building  very  fast  and  massive  database  processors 
is  not  a  difficult  task. 

Corollary  Two:  Supporting  communications  interfaces,  pre¬ 
processing  database  transactions  and  executing  DBMS  software  do 
not  impact  upon  the  performance  of  the  machine. 

Corollary  Three:  The  performance  of  the  machine  is  proportional  to 
the  rate  that  the  data  can  be  moved  in  and  out  of  the  database 
stores. 

What  these  three  corollaries  are  saying  is  that  the  perfor¬ 
mance  of  a  database  machine  hinges  on  its  'I/O  bandwidth'  between 
the  database  stores  and  the  database  processors.  The  machine  per¬ 
formance  has  little  to  do  with  the  amount  of  processing  that  the 
database  processors  must  perform,  since  we  know  how  to  build  fast 
processors. 

How  can  we  achieve  a  very  high  rate  of  data  movement  between 
the  database  stores  and  the  database  processors? 

Solution  One:  At  the  device  level,  we  may,  for  example,  use 


parallel  read-out-and-write-in  disks  and  simultaneous  read-out- 
and-write-in  drives.  In  other  words,  we  may  have  parallel  data 
streams  coming  out  and  going  in  an  individual  disk.  In  addition, 
we  may  have  many  such  disks  moving  parallel  data  streams  in  and 
out  of  the  drives  simultaneously. 

Solution  Two:  At  system  level,  we  process  the  incoming  or  outgoing 
data  streams  separately  and  parallelly  at  the  speed  of  data  move¬ 
ment  with  minimal  communications  among  the  processors. 

In  other  words,  we  do  not  merge  data  streams  before  process¬ 
ing,  since  the  merged  stream  would  have  to  move  faster  and  to  be 
processed  sooner.  Also  we  do  not  attempt  to  increase  the  traffic 
in  inter-processor  communications  and  to  rely  on  complex  communi¬ 
cations  networks.  With  these  solutions,  we  have  arrived  at  some 
important  consequences  which  have  impacts  on  the  machine  perfor¬ 
mance  and  architecture.  They  are  listed  below. 

Consequence  One:  The  effective  rate  of  data  movement  is  increased 
by  the  degree  of  parallelism  and  simultaneity  of  the  database 
stores'  read-out  and  write-in  capabilities. 

Consequence  Two:  The  processing  power  and  processing  speed  of  the 
individual  database  processors  are  only  required  to  keep  up  with 
the  rate  of  data  movement  of  a  single  data  stream  and  do  not  need 
to  ke  ;p  up  with  the  effective  rate  of  data  movement. 

Consequence  Three:  Due  to  the  previous  consequences,  it  follows 


that  multiple  use  of  cheaper  processors  and  local  memories  are 


possible  for  sustaining  high  performance,  that  engineering  changes 
of  the  disk's  I/O  bus  structures  and  triggering  mechanisms  are 
required  (but  no  change  to  the  read/write  heads)  and  that  a 
redesign  of  the  disk  controller  by  incorporating  multiple  proces¬ 
sors  and  their  local  memories  for  the  multiple  data  buses  is 
required. 

Third  Hypothesis:  The  processing  of  meta  data  such  as  catalogs, 
directories  and  schemas  and  the  processing  of  raw  data  such  as 
records,  attributes  and  values  are  different  in  nature,  scope  and 
sequence. 

Corollary  Four:  The  meta  data  and  raw  data  should  have  their 
separate  stores  and  processors.  Furthermore,  their  processing 
should  be  made  concurrent  with  the  processing  of  the  raw  data. 

Consequence  Four:  The  design  of  meta  data  stores  and  processors 
and  the  design  of  raw  data  stores  and  processors  may  be  different 
and  specially  tailored  for  achieving  concurrent  processing  of  both 
types  of  data. 

Database  practitioners  do  appreciate  the  differences  between 
meta  data  and  raw  data  of  a  database.  They  also  appreciate  the 
different  processing  and  storage  requirements  of  these  two  types 
of  data.  They  should  be  pleased  that  the  architect  of  the  future 
database  machines  takes  these  differences  into  design  considera¬ 
tion. 

How  to  provide  newer  functionality? 


Fourth  Hypothesis:  Presently,  every  DBMS  is  model-specific  which 
implies  language-specific  and  in  turn  it  implies  application- 
specific. 

By  model-specific,  we  mean  that  the  DBMS  is  based  on  a  single 
data  model.  For  example,  the  IBM  IMS  database  system  is  based  on 
hierarchical  model.  Consequently,  the  DL/1  is  a  hierarchical 
language  and  all  the  applications  programs  written  for  the  IMS  are 
in  the  DL/1  language. 

Solution  Three:  The  new  functionality  of  a  future  database  machine 
lies  in  its  capability  in  supporting  multiple  data  models  (there¬ 
fore,  data  languages  and  applications) . 

Corollary  Five:  The  future  database  machine  looks  like,  for  exam- 
• le,  a  relational  machine  to  the  relational  database  user,  a 
hierarchical  machine  to  the  hierarchical  database  user,  a  Codasyl 
machine  to  the  Codasyl  database  user,  and  a  new  machine  to  the  new 
database  user. 

Corollary  Six:  A  single  machine  can  support  various  model-specific 
databases  and  languages;  or,  there  are  many  machines  each  of  which 
can  support  a  model-specific  database  and  language. 

Mow  do  we  go  about  designing  and  implementing  a  database 
machine  which  will  support  many  models? 

Solution  Four:  Come  up  with  a  database  kernel  (or  kernel  machine) 
which  takes  care  of  all  the  access  and  update  operations  of  the 
raw  data  and  the  meta  data,  which  allows  'natural'  mappings  of 


model-specific  languages  to  the  machine  language  of  the  kernel, 
and  which  provides  a  model-general  database  structure  for  various 
model-specific  database  organizations. 

Fifth  Hypothesis:  It  is  possible  to  come  up  with  a  low-cost  and 
high-performance  database  kernel  machine  which  takes  care  of  all 
the  data-intensive  operations  [2,3,4]. 

Sixth  Hypothesis:  It  is  also  possible  to  discover  natural  mappings 
of  model-specific  languages  to  the  machine  language  of  the  kernel. 
These  mappings  are  computation-intensive  and  are  not  data- 
intensive  [5, 6, 7, 8]. 

Corollary  Seven:  The  mapping  software  (i.e.,  the  model-specific 
software  interface)  can  be  quickly  executed  by  the  database  pro¬ 
cessors  . 

Consequence  Five:  The  support  of  multiple  model-specific  languages 
has  little  impact  on  the  performance  and  cost  of  the  database 
machine. 


Conclusions:  On  the  basis  of  these  hypotheses,  corollaries, 
consequencies  and  solutions,  the  future  of  database  machines  is 
bright,  since  these  are  sound  hypotheses,  reasonable  corollaries 
and  good  solutions.  We  believe  that  the  future  database  machine 
can  be  cost-effective  with  high  performance.  It  can  also  have  new 
functionality. 
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